Conversation
…into spectrum-feature-generator
…into spectrum-feature-generator
…into spectrum-feature-generator
…trum-feature-generator
pull main in spectrum-feature-generator
…pectrum-feature-generator
…mics/ms2rescore into spectrum-feature-generator
|
|
||
| # Fit calibration and transform all predictions for this run | ||
| calibration = SplineTransformerCalibration() | ||
| calibration.fit( |
There was a problem hiding this comment.
_get_calibration_data doesn't throw an exception if there are no target psms, contrary to _get_calibration_psms. However, unless I'm missing something, fit will still fail if observed_rt_calibration or predicted_rt_calibration are empty. So it should probably be handled here if one of the runs doesn't have any target psms.
| # Calibrate predictions per run | ||
| logger.info("Calibrating predicted retention times per run...") | ||
| for run in psm_list_df["run"].unique(): | ||
| run_df = psm_list_df[psm_list_df["run"] == run].copy() |
There was a problem hiding this comment.
run_df is copied twice, here, and in _get_calibration_data. On top of that, if it would be okay to sort psm_list_df, we could also avoid copying run_df at all, and we could just use:
calibration_df = run_df[~run_df["is_decoy"]].head(num_calibration_psms)I suspect sorting all psms, instead of only the targets might be faster than copying all targets if len(target_df) is considerably greater than num_calibration_psms. But of course, if psm_list_df shouldn't be sorted in place, then you can probably ignore this, except for the double copy.
No description provided.